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Abstract 

This paper addresses the pattern of damage, and investigates its prop- 
erties, of a theoretical hail storm which gathers in intensity before 
subsiding, and which travels linearly across the landscape at constant 
velocity. We start by assuming a simpler model, that of a storm which 
does not move, restricted to having an uncorrelated binormal distri- 
bution of damage. This model, expressed in the natural polar co- 
ordinates, leads to a 1-dimensional pattern of damage as a function 
of the marginal radial distance conforming to the x-distribution with 
two degrees of freedom. We then extend the model to the traveling 
form, allowing further for a correlation of the variables, extending, as 
well, to the multidimensional case. In its full florescence the model 
produces hyperellipsoidal hypersurfaces of equal intensity for the cor- 
related multinormal assumption. We provide closed-form solutions for 
the totality of damages upon these hypersurfaces as proxies for the 
insurance claims to follow. 
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1 Introduction 

The United States Department of Agriculture (USDA) maintains a large 
crop insurance program extending to billions of dollars [l5j. Unfortunately, 
some claims are bound to be fraudulent, and frequently they are related 
through groups of farmers who act in collusion, extending to conspiring 



agents, even insurance companies 11 . Naturally it is desirable to contain 



this fraud, and there is a need for a good understanding of where actual 
storm damage has occurred, and to what extent. 



To gain a better understanding of hail storm damages, this study inves- 
tigates the damage to agricultural crops by hail storms, and the pursuant 
insurance claims. Such claims routinely refer to the distance from the storm 
center, and are known to respond to countervailing influences. Storm dam- 
age occurs with greatest intensity at the center, tapering to insignificance 
at distance. However, the total of claims filed for damage at the center is 
small, and increases as more and more claimants reside at greater distances 
from the center. The total claim value consequently increases from zero as 
a function of distance to a single mode, and then decreases again to zero. 
The research question, therefore, is, "What model based on fundamentals 
faithfully replicates this experience?" The proposed distribution answers 
this question with parsimony, and is herewith advanced. 

This paper is organized as follows. The upcoming section analyzes the 
log-normal distribution model, which was previously used to describe hail 
storm damage [5j[6]. The following section discusses the 2-dimensional case, 
under the simplifying assumption that the hail storm does not move over the 
landscape. The model is that of the independent bivariate normal probabil- 
ity measure of damage intensity. Insofar as damage intensity is independent 
of direction from the storm center there is only one independent variable 
- the radial distance from the center. The resulting marginal distribution 
on the identity random variable of radius is the ^-distribution. In the next 
section we extend the model to the traveling form, introducing dependence 
in the bivariate normal probability measure, and subsequently extend this 
to the multivariate case. The final phase of the study applies the model to 
extensive data sets of hail events and their 'severe probabilities,' as detected 
by the NEXRAD network of weather radars. 



2 A log-normal distribution model 

Hail storms can give rise to various forms of damage, including damage to 
motor vehicles |13| and to agriculture. In the context of agriculture, the log- 
normal distribution has been used to describe insurance claim data [5j[6]. 
Although this distribution fits the data reasonably well, we show that there 
is a theoretical objection to using the log-normal distribution. 

The log-normal distribution with parameters (i and a has density func- 
tion [2] 

1 f (lnr — 1 
9n(r) = gR(r;n,a) = -== exp <^ >, r > 0, (1) 



raV2^ I 2ct 2 



2 



and distribution function 



G R {r) = G R (r-^o)=N 
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with N the standard normal distribution function. 

Suppose that gn{r) is the marginal probability density function in the 
radial direction of some joint density g(r,(p) of random variables R and $. 
The damage density at the center can be expressed as the average density 
over a small disc centered at the center, that is, 



where the last equality follows by applying l'Hopital's rule. In other words, 
the log-normal distribution corresponds to a damage pattern with zero dam- 
age density in the center, which is unlikely to be the case for a hail storm. 
This might, however, be desirable for other kinds of storms, like tornados 
and hurricanes. 

3 A binormal damage pattern and the ^-distribution 

If the log-normal distribution is unfit for describing damages, what other 
distribution is suitable? We make the following desirable assumptions in 
the damage pattern of a hail storm. The damage function is unimodal at 
the center, smooth, dependent only on the distance from the center, and 
scalable to a probability density function. 

The simplest distribution with these attributes is the standard bivariate 
normal, or simply binormal, distribution. We consider the standard proba- 
bility space {M 2 ,£>,P}, wherein the first component is the Euclidean plane, 
the second the Borel sigma algebra, and the third is the binormal indepen- 
dent probability measure. Equip the plane with Cartesian coordinates (x, y) 
and polar coordinates (r, 9) and define a random variable R as the identity 
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function on the radial coordinate r, independent of 9. Thus R graphs to an 
inverted cone with apex at the origin of the plane. One also may define the 
random variable © as the identity on the angular coordinate 9, independent 
of r. This variable has the uniform distribution. 

The usual Euclidian expression of the density of the binormal distribu- 
tion, founded on the identity random variables (X, Y) on the respective axes 
with variables (x,y), is 

\ / ^2 _|_ y2 

f(x,y) = — exp 
The corresponding polar expression is 



1 ( r 2 

g(r) = ^: rex P 



That g(r(x,y)) induces P is clear, for 
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A °-2i L - X p - y dr = 2..- = l. (3) 



Our attention turns to the distribution of the storm damage as distance 
from the storm center, insofar as the intensity is independent of the direction 
from the center. The marginal distribution of R in these circumstances is 

G(r) = Pr{i? < r} = — / s exp I ds = 1 — exp I . 

27T Jo \ 2 J \ 2 J 

This is the familiar ^-distribution with two degrees of freedom. The density 
g(r) is the integrand on r in Equation 

4 Traveling form of the hail storm damage model 

Let us assume that at any moment in time, the damage density D c (x) at 
the location x G R 2 of a hail storm is binormally distributed, that is, 

^c(x) = i-e-iH— II 2 , 

where c E M 2 is the center of the hail storm and || • || is the Euclidean norm. 
During the storm, let us assume that the center moves with a constant 
velocity vector v 6 M 2 . Choosing coordinates x such that the center is at 
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the origin at time t = 0, the trajectory of the center is then given by 
c = tv. The intensity I(t) of the storm at time t is assumed to be normal, 

1{t) = _L_ 

V 27177 



with the time coordinate chosen such that the peak intensity happens at 
time t = 0. After scaling the time coordinate by a factor a, we can assume 
that (j=l. 

Under these assumptions, the total damage density T(x) at the point x 
is given by the marginal density 

The integral can be computed by completing the square. Writing 

a := V /TTRF, 8 : =a t-^^-, 

a 

with (-, •} the standard inner product, one finds that the total damage 
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is also binormally distributed, but now with a correlation in its random 
vector. To bring this density in standard form, write v = (v±,V2) and 
introduce the parameters 



o-i := + o- 2 := y/l + v%, 



Then 

T(x) = / exp{--x T S~ 1 x 

27rydet(S) I 2 

which is the standard form of the bivariate normal distribution with zero 
mean and covariance matrix 

a\ po- x a 2 
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5 The marginal distribution at a distance 



The previous section suggests that the damage distribution carries a symme- 
try in the angular direction. Marginalizing the damage distribution in the 
radial direction, this symmetry can be used to decrease the dimension of the 
damage distribution by 1. We choose to perform this process for a general 
multinormal distribution, as this is not much harder than the bivariate case 
and might be used for other modeling purposes. 

Suppose a random vector X on R n is multivariate normally distributed 
with density function 



r x (x) 
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with mean fx and covariance matrix £ of full rank n. The density parti- 
tions Euclidean n-space into level hypersurfaces with constant probability. 
Since the covariance matrix 5] is symmetric positive definite, it admits an 
orthogonal diagonalization 



£ = Q T DQ, Q:= [v u ...,v n ], 



D 



diag{a 2 , . . . ,a 2 } 



The level hypersurfaces form a family of hyperellipsoids with center fx, semi- 
axis lengths a±, . . . , a n in constant proportion [a± : ■ ■ ■ : a n ], and directions 
of the principal axes given as corresponding eigenvectors v\, . . . ,v n of the 
covariance matrix S. Transforming to a random vector Y by the change of 
coordinates y := Q(x — jj) yields the probability density function 



3V(y) = T x (Q T y) 
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of Y. Changing to hyperspherical coordinates by the map 
(0, oo) x [0, vr]"- 2 x [0, 2vr) — »• 1" 

defined by 



z = 



r 




a\ cos(^i) 
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a 2 sin(0i) cos(0 2 ) 




i — > y = r 
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yields a random vector Z with probability density function 

•9(2/1,2/2, • • • ,2/n) 



Tz(z) = T Y (y) 



det 



d(r, 0i, . . . ,0 n _i 

.71—1 2/ j, \ n— 3 
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r"- 1 sin""^^!) sin™"^) • • • sin(<£ n _ 2 ) exp 



that respects the foliation by hyperellipsoids. 

Marginalizing out the angular random variables, one is left with the 
marginal radial random variable R with marginal probability density func- 
tion 

T R( r ) = / •••/ / r z (r,0i,...,0 n _ 2) ri -i)d0 n _id0 n _2---d0i 
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where we used that the surface area of the unit sphere of dimension n — 
1 is 2-7r n//2 /r(n/2), with T the gamma function. One recognizes Tn(r) as 
the density function of the ^-distribution with n degrees of freedom. One 

2 2 

hits the interior of the hyperellipsoid defined by % + • • • + % = R 2 with 
probability 

Pr(0<i?<r)= / r R (s)ds = P(n/2,r 2 /2), 
Jo 

where P is the regularized Gamma function [TJ §6.5.1]. 

For n = 2 we recover the hail storm setting. To evaluate insurance claims 
it is helpful to compare, at the point x, the reported total damage to the 
expected total damage. Since the latter quality is, by definition, constant 
along the level curve through x, it is tempting to reduce the dimension of 
the problem by considering the marginal distribution in the radial direction, 
which has density function 

f r- 2 

Tr(t) = rexp 



r 
~2 

corresponding to the x-distribution with two degrees of freedom, also known 
as the Rayleigh distribution. The total damage within the ellipse defined by 
+ = R 2 takes on the particularly simple form 

Pr(0 <R<r) = J T R (s)ds = 1 - exp j-~ J . 
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As we shall see in the next section, even though the log-normal distribu- 
tion depends on an additional parameter, it only has a slightly better overall 
fit than the x-distribution. In addition its density is too low near the center 
and its tail is too fat. 

6 Fitting the model to data 

In this section we fit our model to a data set of hail events, as estimated 
by the Next Generation Weather Radar system (NEXRAD) network |3j. 
Distributed throughout the United States and selected overseas locations, 
over a hundred weather radars measure the reflectivity, mean radial velocity, 
and spectrum width. These meteorological base data quantities are used to 
search for patterns that estimate the presence, and likelihood, of various 
kinds of severe weather events. One of the data sets derived from this 
processing is the Hail Index Overlay, which is designed to locate storms 
with the potential to produce hail. This data set is organized as a collection 
of hail events and the probability that the event is severe, which can be 
thought of as a potential intensity of the hail event. The National Climatic 
Data Center makes these hail events publicly available through the Severe 
Weather Data Inventory [8] . 

We are, however, not interested in single hail events, but in hail storms. 
Experimenting with various hierarchical agglomerative clustering methods 
convinced us that the single-linkage distance gives rise to clusters closely 
resembling our own intuitive notion of a storm. Using R [10] and in particular 
the package f lashClust (4j, we compute the hierarchical clustering tree from 
a large collection of hail events in January, 2010. See Murtagh [7] for the 
details of the underlying algorithm. A priori we do not know how many 
storms to expect. Following a rule of thumb, we cut the dendrogram when 
the next merging gives rise to a disproportionate jump in the clustering 
criterion. In this manner, we clustered the hail events in the month January 
in several storms. We chose one representative storm that was not too large, 
from January 20, 2010, which is listed in Table [T] and shown in Figure [T] on 
top of a map of the vicinity of Laurel, Mississippi (9l. 

The events appear relatively near each other and far from either pole, 
implying that we can approximately treat the longitude and latitude as 
Cartesian coordinates. Let us assume that the locations Xj of the hail events 
in Table [T] are sampled from a binormal distribution with density as in 
Equation Q . Each Xj comes with a severe probability Pi that is interpreted 
as a weight of the event. The center fi and covariance matrix S of the storm 
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Table 1: Hail events belonging to a hail storm on January 20, 2010, in the 
vicinity of Laurel, Mississippi. Each of the 46 hail events lists a time t{, a 
location X^r CIS £1 column vector [longitude, latitude] T , and a severe probability 
Pi. 



can be estimated by the maximum likelihood method, as 



1 



a 



xy 



a 



j-y 



o~T, 
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T^Pi 
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J2Pi(xi-fi)(xi-fi) T . 



(5) 



(6) 



The resulting fitted binormal distribution is depicted in Figure [T] by some 
of its contour lines. 

By the discussion of the previous section, the marginal distribution in 
the radial direction is the ^-distribution with two degrees of freedom. Be- 
cause the pair (ai, 02) of semi-axes is only defined up to multiplication by a 
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Figure 1: Drawn as points on top of a map in the vicinity of Laurel, Missis- 
sippi, are the hail events from Table [TJ with sizes proportional to their severe 
probabilities. In addition, contour lines of a fitted binormal distribution are 
drawn. 

constant, we can consider a family of x _ hke distributions 

F(r;X) = 1 -expj-^A 2 r 2 j , r > 0, 

parametrized by A > 0. 

To find the estimator A of the parameter A that fits our data best, we 
reorder the data by distance from the center. Such a distance function 
should be zero at the center and constant along the level curves of Tx(x). 
It is easily checked that the function d : M? — > [0, oo) defined by 

d(x) = ^-^(x-?) (7) 

has these properties. In the case of the standard binormal distribution with 
Ijl = and S the identity matrix, this is the ordinary Euclidean distance to 
the origin. 

Let 7r be a permutation of the indices of the hail events for which 
(c^x-n-J). becomes a nondecreasing sequence of distances. Estimating the 

parameter A = A for which F(r; A) is the best fit of our data can be done by 
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solving the nonlinear least square problem 

A := argmin V [f^x^); A) - V pJ ■ (8) 

Solving this problem numerically using Sage [14], we find that the sum of 
squares reaches its minimum of 0.067 at A ~ 7.308. Similarly a best fitting 
log-normal distribution can be found by numerically solving the nonlinear 
least square problem 

(jj,,a) := argmin V" [g^x^); fj,, a) - V P- Kj . (9) 
( M , CT )eRx(o,oo) i J 

One finds that the sum of squares reaches its minimum of 0.0483 at fl ~ 
-1.862 and a 0.6227. 

Comparing sums of squares, the log- normal distribution has a slightly 
better overall fit than the ^-distribution, which is to be expected because of 
its additional parameter. Plotting the residuals of the fitted x-distribution 
and log-normal distribution shows that they are approximately normally 
distributed. The F-test of the equality of two variances yields an F-statistic 
of approximately 0.067/0.0483 with corresponding p- value 0.142, taking into 
account the additional parameter of the log-normal distribution. The null- 
hypothesis of equality of variance can therefore not be rejected at the 10% 
significance level. 

Figure [2] simultaneously shows the empirical distribution for the distance 
function (JT]), the best-fitted x-distribution and best-fitted log- normal distri- 
bution. Qualitatively, the fitted log-normal distribution is too low near the 
origin, confirming the discussion in Section [2j and its tail seems to be too fat 
for the data. This can be seen more clearly from the Q-Q plot in Figure [3] 
Note that the x-distribution is also too low near the origin, but slightly 
better than the log-normal distribution. 

Finally, let us note some limitations of the model. In order to approxi- 
mate longitude and latitude by Cartesian coordinates, the storm cannot be 
too large. In addition, for the Coriolis effect to be negligible, the storm can- 
not last too long. When using these hail intensities as proxies for damage 
claims, the underlying geography should be homogeneous. This is for in- 
stance the case with large-scale corn field agriculture. Moreover, our model 
does not reflect that different types of hail storms can cause different types 



of damage 12 . For instance, larger hail stones are more likely to damage 
motor vehicles, while hail storms with small but numerous hail stones have 
a greater damaging effect on crops. 
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Figure 2: The empirical distribution for the distance function ([7]), together 
with a fitted ^-distribution (drawn solid) and a fitted log-normal distribution 
(drawn dashed), found by solving the nonlinear least squares problems Q 
and @. 
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Figure 3: A Q-Q plot comparing the empirical distribution on the vertical 
axis to the fitted x-distribution (+) and the fitted log-normal distribution 
(o) on the horizontal axis. 
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